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Data ordering in large fast-Fourier transforms (FFT’s) is both conceptually and 
implementationally difficult. This article describes a method of visualizing data 
orderings as vectors of address bits , which enables the engineer to use more efficient 
data orderings and reduce double-buffer memory designs to single-buffer designs. 
In particular , this article details the difficulties and algorithmic solutions involved 
in FFT lengths up to 4 megasamples (Msamples) and sample rates up to 80 MHz. 
Although the particular solutions mentioned may be directly applicable only to the 
particular system for which they were intended, the methodology by which these 
solutions were found could be useful to anyone confronted with similar problems. 


I. Introduction 

The Search for Extraterrestrial Intelligence (SETI) Pro- 
gram has recently completed the wire-wrap prototype 
wideband spectrum analyzer (WBSA) [1] and is about to 
start design of the sky-survey signal processor (SSSP) [2]. 
Both of these machines are high-speed fast-Fourier trans- 
form (FFT) processors followed by special-purpose signal- 
detection hardware. In the course of building the WBSA, 
a particular methodology grew for the design of FFT ma- 
chines (memory boards, in particular). These methods 
have cut the memory requirements for the proposed SSSP 
system by about 50 percent. 

This article begins with a brief description of the SSSP 
to familiarize the reader with the system from which exam- 
ples will be drawn. The backbone of this article is the data- 
reordering scheme, which is followed by a section outlining 
the method of replacing double buffers with single buffers. 
Finally, the design of the SSSP’s input buffer (INBUF) 
board is presented as an illustration of the design tech- 
niques. 


II. Brief Description of the SSSP System 

Figure 1 is a block diagram of one of the eight identical 
processors that will constitute the SSSP. Each processor of 
the system accepts complex samples at a rate of 80 MHz 
and performs an FFT on each group of 4 megasamples 
(Msamples). Although the processor’s internal clock runs 
at only 40 MHz, the processor can accept data at 80 MHz 
because it has two input lines, the high-input line and the 
low-input line. The high data point is the one that was 
sampled first and held until the low point was sampled. 
Every 25 nsec (40-MHz rate), two sample points enter the 
processor, and two frequency values from a previous spec- 
trum leave the processor at the back end of the pipe. 

A pipeline configuration was chosen to facilitate real- 
time processing. To accommodate a 4-megapoint (Mpoint) 
FFT, the data are configured as a 4096-by-1024 matrix, 
and the FFT is broken into two orthogonal FFT’s, one for 
the data in each column and one for the data in each row. 
Each FFT is further broken into radix-4 FFT stages. The 
two groups of these stages are called super-stages. 



Proper FFT implementation demands that the first 
FFT super-stage be performed on points that have the 
largest possible sampling interval between them, which 
indicates the columns of the matrix. Accordingly, the 
INBUF board transposes the matrix before the first of 
the two FFT super-stages. The result of the transposition 
is a 1024-by-4096 matrix, in which the first new row con- 
tains the first point of all the old rows, the second new row 
contains the second points, and so on. 

Using the convention that the matrix is always oriented 
so that FFT’s are performed on rows, the first super- 
stage performs an FFT on each 4-kilopoint (kpoint) row. 
Then the matrix is transposed by the corner-turn mem- 
ory (CTM) board to prepare it for the second super-stage. 
The 1-kpoint FFT’s are performed on each of the new 
rows, resulting in a 4-Mpoint spectrum of frequency bins. 

The real adjust (RADJ) board reorganizes (but does 
not transpose) these frequency values for its own purposes. 
The unscrambler (UMR) board undoes the shuffling done 
by the RADJ board and transposes the matrix for a final 
time, putting the frequencies in sequential order. 

III. Relating Data Order to Addressing 

Consider the sampled data stream as an array, in the 
case of the SSSP, a 2 22 -point array. As such, each data 
point can be specified by a 22-bit index number. Since 
the SSSP is a dual-rail system, 2 21 of the samples come in 
on the high rail, and the other 2 21 enter on the low rail. 
The 22-bit index number can be divided as follows: One 
bit, known as the hi/lo bit, indicates which rail the data 
came in on, 0 for high, 1 for low. (As a convention, this 
bit is always the least-significant bit, lsb.) The remaining 
21 bits form a 21-bit binary counter, which increments 
with each clock cycle. As a design convention, the bits 
of this data counter are labeled Co (lsb) through C2o, the 
most-significant bit (msb). 

At each clock cycle in Fig. 2, two data points enter the 
SSSP, one on the high data rail and one on the low data 
rail. Each of these data points has its own 22-bit index 
number, and they must be different. Because they enter 
the board at the same time, the counter bits (the most- 
significant 21 bits) are the same. The only difference is the 
hi/lo bit, which will be a 0 for the data point on the high 
rail and a 1 for the data point on the low rail. Figure 2 
depicts this graphically for three clock cycles in the middle 
of a spectrum. 

At each of the three clock cycles, the value of the time 
counter is given in both decimal and binary form. Below 


the time counter are sections of the high- and low-data 
channels, one data point per channel per clock cycle. The 
data-order index of each data point is given in decimal 
and binary forms inside the data box. Because data in 
this example are in sequential order, the bits in the time 
counter are in the same order as they appear in the data- 
order index. If a different order had been chosen, the bits 
would have been scrambled, but sequential order is easiest 
to visualize. In sequential order, the index for data on 
the high channel will always be exactly double the time 
counter, and the index for data on the low channel will 
always be one more than the simultaneous data on the 
high channel. 

Once this 22-bit index number is established for a given 
data point and the addressing algorithm is known, every- 
thing that happens to that data point is also known. The 
data point is stored in a memory array using the 22-bit 
number as an address, so it is known where the data are 
stored. When the data are read out of memory, their chan- 
nels (high or low) and their positions in the data stream 
are based on those same 22 bits. In the 4-bit example 
in Fig. 3, data (in the boxes) enter in sequential order. 
Above each box is its 4-bit data-order index. The 3-bit 
time counter always matches the 3 left- most index bits, 
and the right-most bit indicates the high or low channel. 
The resultant pattern, A 0 through A3, differentiates each 
data point by identifying its position in the sequential data 
stream (0-15). 

As an example, these points are output at the bottom 
of Fig. 3 in high-low (HL) order. An A-point data stream 
in HL order looks like: 

HI CHANNEL: 0, 1, 2, 

3, 4, 5, ... 

LO CHANNEL: N/ 2, N/ 2+1, AT/2 + 2, 

N / 2 + 3, N / 2 + 4, N / 2 + 5, ••• 

To organize data points in HL order, the msb of the 
data-order index moves to the least-significant position. 
The other bits are all in sequential order, but they are 
shifted to the left by one bit. Again, the time counter 
matches with the left-most 3 index bits, but now those 
bits are A 2 , A 1? and A 0 . (They were A 3 , A 2 , and Ai in 
sequential order.) The right-hand bit that indicates the 
high or low channel is now A 3 . However, when these bits 
are written in their correct numerical order, A3A2A1A0, 
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they are the correct binary representation of each data- 
order index (the large numbers inside the boxes). 

In general , any data-reordering scheme ( from sequential 
order to HL order in the above example) can be viewed as 
a transformation of the data-index values. In this way, the 
SSSP data index is viewed as a 22-bit vector and the trans- 
formation as a 22-by-22 matrix. The resultant new vector 
is composed of 22 new data-index values. Each new index 
value is a function of one or more of the original index 
values. For simple reordering (as in the above example), 
each new index value is a function of exactly one old index 
value, and the transformation is simply a reordering of the 
bits of the data index. Other, more complex, transforma- 
tions are also useful. 

IV. Addressing Single Buffers and Double 
Buffers 

The SSSP is based on the decision to use single buffers 
instead of double buffers wherever possible, sacrificing sim- 
plicity for the sake of hardware savings, board space, and 
lower power consumption. In the case of the INBUF 
board, which will be discussed later, this amounts to 
8 Mbytes of memory, saving $8400 and a maximum of 
18 W for each of the 8 copies that will be made. Sav- 
ings for the CTM board are treble this, a net total of over 
$200,000 and 384 W for the 8 copies of the CTM. 

Figure 4 shows the way in which the use of single buffers 
differs from double buffers. In each example, the WRITE 
line tells which spectrum is being written at any particular 
time, and the READ line tells which one is being read. 
(Notice that the first spectrum to be read is unlabeled 
because it contains no real data and is discarded.) Below 
the WRITE and READ lines, the addressing lines tell the 
addressing used for each buffer. 

Notice that when using a double buffer, only two ad- 
dressing orders are required, a read order and a write or- 
der. Whenever a spectrum is being read out of one buffer 
(using the read addressing), another spectrum is being 
written to the other buffer (using the write addressing). 
Then the latter buffer is read, while a third spectrum is 
being written into the former buffer. 

For simplicity’s sake, it is often convenient to write the 
incoming data in sequential order and read them out in 
transformed order; call the transformed order T. If the 
order of the data can be written as an Appoint vector, then 
the transformation between two orders can be written as 
an N-by-N matrix. Thus, each data order in Fig. 4 is 


labeled as sequential or as a power of the transformation 
matrix, T, a 5-by-5 matrix. The columns of each matrix 
correspond to the different address lines of the memory 
buffer, A to E from top to bottom. The order of these 
columns mirrors the order of the address bits in the 5-bit 
data-order index. 

It is important to realize that if data are to be read out 
in sequential order after the transformation, they would 
have to be written in invers e-T order. This is a very im- 
portant basic concept: Data can be written in any order, 
as long as the transformation between the written order 
and the read order is T. For example, if instead of being 
written sequentially, the data order were transformed so 
that data were written in W order, they would have to be 
read out in R order such that R = T * W . 

This idea is the basis for the single-buffer addressing 
schemes. Using read-write cycles, in which a given address 
is read then overwritten, the first spectrum is written se- 
quentially. However, when it is read out later (again using 
read-write cycles), the second incoming spectrum must be 
written in the same order that the first is going out. (If 
new data were written in a different order, some memory 
locations would be rewritten before they were read.) This 
means that when Spectrum 0 is read out in transformed 
order, T, Spectrum 1 is being written in T order. 

This is the problem. If the second spectrum is read out 
in T order, as in the double-buffer example, it would ap- 
pear in the same order it went in, sequentially. So it must 
be read out in T * T order. This means that Spectrum 2 
will be written in T *T order and read in T * T * T or- 
der, Spectrum 3 will be written in T *T *T order, and so 
on. Since there are only a finite number of ways to reor- 
ganize the data stream, the pattern will eventually repeat. 
Reworded in more rigorous terms, a finite P exists such 
that T p = T° - /, where I is the 5-by-5 identity ma- 
trix representing sequential order. However, if P is very 
large, the cycle will take a long time to repeat, requiring 
extra counters, and leading to complicated address equa- 
tions and confused engineers. The trick is to simplify the 
address patterns as much as possible by minimizing P . 

V. Interleaving 

This section deals with the practical constraints of the 
present technology. The first difficulty one is likely to run 
into is the speed of available memories. The particular 
memory modules used in this processor have a 100-nsec 
read/write cycle time. (Faster memories are available but 
not in sizes that are practical for the amount of data re- 
quired.) Because the clock period is 25 nsec, new data can 
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be written only every fourth clock cycle. In the meantime, 
data must be stored in registers until the start of the next 
100-nsec cycle (see Fig. 5). Because 2 new points are ready 
to be written each clock cycle (25 nsec), 8 points accumu- 
late every 100 nsec. All 8 data points must be written at 
the same time, requiring 8 memory modules operating in 
parallel (see Fig. 6). 

Likewise, 8 data points are read out during each 
100-nsec read/write cycle. Two data points are selected 
by the multiplexer during each of the next four clock cy- 
cles. Meanwhile, another 8 points are being read, and the 
cycle repeats. The following examples will refer to writing 
data, but it is to be understood that reading data is an 
analogous operation that is happening simultaneously. 

If the sequential example from Fig. 3 were implemented 
using Fig. 6, the first point on the high channel would be 
stored in the HO memory slice, the second point on the 
high channel in the HI memory slice, and so on. Some- 
thing must designate which memory slice a point will be 
written to. In this simple example, the answer is obvious. 
The bottom 2 counter bits can be fed directly to the de- 
multiplexer to specify where to steer the data. (This is 
a slight oversimplification. In actual implementation, the 
input registers must include clock enables that depend on 
the same 2 bits.) 

More complex situations demand a more formal ap- 
proach. The 8 memory slices can be distinguished with 
a 3-bit number, M = (M 2 , Mi, Mo). Since each group of 
8 data points is written simultaneously, each point within 
the group must be written to a different memory slice. 
Thus, the 8 different data indices must each designate a 
different memory. It is known that these 8 points all ar- 
rived within 4 clocks of each other. That means that all the 
bits of the indices will be the same except for the 2 least- 
significant (fastest changing) counter bits and the hi/lo 
bit. These 3 bits must all be what are called memory- 
selection bits. The memory-slice number, M, is a function 
of those 3 bits, and the 8 possible values of the 3 bits must 
correspond to every possible value of M. To clarify no- 
tation, M is the 3-bit number that specifies the memory 
slice, M num j cr is one of the bits of M, and Mutter is a 
memory selection bit within the data index number. Each 
Mnumbtr is a function of one or more Mutter bits. 

Let the hi/lo bit be named M a , the least-significant 
counter bit M*, and the second counter bit M c . In the 
example of Fig. 3, the simplest solution is 

M 2 = M a 


M X =M C 
Mo = Mb 

Under this scheme, the data from the high channel will fill 
memory slices 0 to 3 in sequential order, and the data from 
the low channel will fill memory slices 4 to 7 in order. How- 
ever, it is perfectly reasonable to use a different ordering. 
For example, the functions could have been defined 

M 2 = M a 

Mi = M fr 

Mo = M h A M c 

where A is taken to be the C-language symbol for 
“exclusive-or.” 

At time t = 0, the high-channel data will be written to 
memory slice 0 and the low channel to slice 4. At t = 1, 
Mb — 1 and M c = 0, so the high-channel data will be 
written to slice 3 and the low channel to slice 7. At t = 2, 
high-channel data will be written to slice 1 and low data 
to slice 5. Finally, at t = 3, high and low data are written 
to slices 2 and 6, respectively. The cycle repeats at times 
t — 4,8, 12, and so on. 


VI. INBUF Board 

The first (and simplest) memory board in the system 
is the INBUF board. Data enter sequentially: 

HI CHANNEL: 0, 2, 4, 6, 8, — 

LO CHANNEL: 1, 3, 5, 7, 9, ••• 

v 

This ordering can be represented as 22 address bits, la- 
beled A to V, with V being the least-significant (hi/lo) 
bit (see Fig. 7). The data can also be looked at as be- 
ing in a large 4096-by-1024 matrix, without altering the 
ordering at all. Step 1 is a purely conceptual step. The 
most-significant 12 bits designate the row number. The 
least-significant 10 bits specify the column (or position 
within the row). The row bits are separated from the 
column bits by a hyphen. As a standard convention, the 
least-significant bits are always row bits unless otherwise 
specified. 
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To fully reap the benefits of doing an FFT, the first 
stage operates on the samples that are most widely sepa- 
rated in time [3]. Subsequent stages work on data points 
nearer each other in time, and the last stage operates on 
sequential samples. In terms of the matrix FFT, the first 
super-stage should operate on the vector that contains 
samples (*So, * * * ^ 4 Meg— it)* This is the 

first column of the matrix. (The first super-stage operates, 
of course, on each individual column, not just the first.) 

The FFT boards that follow the INBUF operate on 
groups of 4 kpoints of data at a time. To minimize the 
amount of memory required within the FFT boards, those 
4 kpoints should enter sequentially, so the INBUF board 
transposes the matrix in Step 2. The old row bits become 
column bits and vice versa. 

The next reorganizational step (Step 3) is dictated by a 
knowledge of the inner workings of the FFT boards. They 
have been designed to do a separate FFT on each chan- 
nel, with no data being shared between the high and low 
channels. Thus, a given row appears on one channel or 
the other, not both. This requires that the hi/lo bit be a 
row bit. The least-significant row bit moves to the hi/lo 
position. 

Step 4 also depends upon the particular architecture of 
the FFT boards. They contain cascaded stages of radix-4 
FFT’s, which do 4-point FFT’s on groups of the incom- 
ing data. The first stage does its work on the following 
quadruples: 

(Sb, ^IMegi ^2 Megi ^3Meg^i 

{S l , S\Meg + 1 ) §2M eg+\ > &3M ey-f l) > 

(S 2 , S\M eg+2 j &2M eg+2i S^Meg2) ? etc. 

The order in which these quadruples appear is not impor- 
tant, but the order of the points within each quadruple is. 
So the two most-significant column bits move to the least- 
significant positions, creating what is known as high-low 
radix-4 order. 

This completes the required reordering of the INBUF 
board. If data are written sequentially, A through V (as at 
the top of Fig. 7), and read using the addressing scheme at 
the bottom of the figure, M through V, all the necessary 
operations will be performed in a single step. Unfortu- 
nately, if this addressing pattern is implemented using a 
single buffer, the pattern only repeats every 21 spectra (as 


seen in Fig. 8). The worst feature of this particular de- 
sign is that each address bit appears in a different column 
each spectrum. Thus, each and every address bit is a func- 
tion of 21 different data-counter bits and a 5-bit spectrum 
counter, yielding huge unwieldy logic equations. With a 
little ingenuity, this can be simplified greatly. 

Figure 9 demonstrates the design process of an efficient 
scheme. Step 1 is the same as before, and Steps 2 and 3 are 
similar. As noted before, the order in which the quadruples 
come is irrelevant. In fact, the order in which the rows 
come is also irrelevant. The only bits that are fixed are the 
2 least-significant column bits, A and B. The other column 
bits are all interchangeable, as are the row bits, so they are 
all left blank. (Column bits are not interchangeable with 
row bits because a complete row must go into each channel 
of the FFT before the next row enters. This ensures that 
a separate FFT is being done on every single row. Thus, 
row and column bits are specified as such although they 
are not specifically identified.) 

The goal of this method is to keep the addressing equa- 
tions as simple as possible; whenever possible, address bits 
will remain in the same locations. This is possible only 
in the cases of the 3 least- significant row bits (which be- 
come the 3 most-significant column bits) and the least- 
significant column bit (which becomes the least-significant 
row bit). These bits are J, A, L, and V, respectively. In 
Step 5, these bits are assigned to remain in their respective 
positions. 

A and B are the only bits that are constrained to move 
to particular positions. To minimize the cycle number, A, 
these two bits must return to their original positions as 
soon as possible, preferably after the next cycle. Thus, 
it is desirable that whichever bits lie in the second and 
third positions from the right in one spectral addressing 
will move to the two most-significant positions one spec- 
trum later. By extension, T and U , which were in the 
second and third positions before the transformation, are 
required to be in the most-significant positions after the 
transformation. See Step 6. 

By Step 7, all the constraints of the INBUF board have 
been satisfied, so a column order that is beneficial to the 
FFT boards is used. (This particular order allows the four 
FFT boards that comprise the two stages of the FFT to be 
built as identical copies of each other and minimizes the 
amount of memory needed to reorganize the data between 
each internal stage.) Once this order has been determined, 
the rest of the design follows from matching pairs of ad- 
dress bits in the same manner that A and T were matched 
and B and U were matched. This ordering scheme repeats 
after two spectra, which can easily be verified. 
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Now the design of the transformation is complete, but 
the implementational design remains. The implementa- 
tional phase occurs when the 22 data-index bits, originally 
dubbed A through V , are designated as memory-selection 
bits or addressing bits. Because the design requires an 
eight-way memory interleave, three of the bits will be 
reserved for memory selection. The memories are each 
512 kpoints deep, requiring 19 addressing bits (although 
some of these will also be used for memory selection). 

Figures 10 and 11 describe the process of renaming the 
data-index bits. A bit chosen as an addressing bit is re- 
named Angler, and a memory-selection bit is renamed 
Mutter- These are just new names for the bits that were 
previously designated A through V. The old names were 
only placeholders; the new names also give information 
about how the transformation will be implemented. 

There is a lot of freedom here, but the most straight- 
forward choice is to assign the 3 memory-selection bits in 
the right-hand positions and the 19 address bits in order 
to the left (see Fig. 10). This works fine for spectrum 0. 
However, when the address bits have been transformed for 
spectrum 1, only one memory-selection bit remains in the 
three right-hand spots. This means that when a group of 
8 data points comes in, the board will attempt to write 
them all to only two memory slices. 

This problem is alleviated by stipulating that A and 
B must also be memory-selection bits as well as address 
bits (see Fig. 11). M 0 is now a function of both A and T, 
“exclusive-or”-ing bit A with bit T. This ensures that out 
of each group of 8 data points, half will be directed to even 
memory slices and half to odd memory slices. Likewise, B 
is “exclusive-or”-ed with U to determine M\. Now each 
group of four clocks (100 nsec) accesses all eight memory 
slices. Note: Only groups that start on a 100-nsec bound- 
ary (Ci = Co = hi/lo = 0) need to be considered. 

Now the design of the INBUF is complete. The Boolean 
equations for each address bit can be read directly off of 
Fig. 11. For instance, A 0 is equivalent to C 2 during spec- 
trum 0, and it is equivalent to C 17 during spectrum 1: 

A 0 = (C 2 * S') + (Ci7 * S) 

where S is the l-bit spectrum counter, Cnumber is a bit 
selected from the 21-bit data counter, ' is logical “not,” + 
is logical “or,” and * is logical “and.” Similar equations 
can be found for A\ to Ai 6 . 

Notice that two address bits occur in the least- 
significant three places during spectrum 1. This causes 


some conceptual difficulties and requires different address 
lines to different slices of memory. But remember that 
A\s is defined as being equivalent to M e , which in turn is 
equivalent to Mo A M c , where A is “exclusive-or.” Thus: 

A\s — C 20 

for memory slices 0, 2, 4, and 6, and 

Ais = (C 20 * S f ) + (C 20 ' * S ) 

for memory slices 1, 3, 5, and 7. Equations for A 17 can be 
similarly derived. 

VII. Different Transformations 

All previously mentioned transformations reorganized 
the address bits but leave the bits unchanged. Sometimes, 
the designer may wish to read out the data using address 
bits that are functions of, but not identical to, the original 
address bits. The simplest example is reverse order. If 
data are written sequentially, using address bits A , B , and 
C, they can be read out in reverse order by just inverting 
the address bits, as shown below: 

Counter (A, B,C) 000 001 010 011 •** 

Data in 0 1 2 3 - 

Reverse counter (A', JB', C") 111 110 101 100 •** 

Data out 7 6 5 4 

Another useful transformation is the downcounter 
transformation, which is easily implemented by substitut- 
ing a downcounter for the regular counter. The bits of the 
downcounter are indicated as A, B, C. 

Counter (A, B, C) 000 001 010 011 — 

Data in 0 1 2 3 • • • 

Downcounter (A, B } C), 000 111 110 101 ••• 

Data out 0 7 6 5 

These two operations have one thing in common; they 
are reversible. That means that after the address bits are 
transformed, no information is lost; the original address 
bits can still be recovered. Reversibility is important be- 
cause it prevents memory slices from attempting to write 
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multiple data points to the same spot. There are many 
reversible operations, such as “exclusive-or”-ing two bits 
(as in Fig. 11). 

VIII. Limitations 

The use of single buffers is limited by two consid- 
erations. First, the interleaving required to implement 
read/write cycles at the proper speed may require as much 
memory as a double buffer. In such a case, no savings 
would be realized by using a single buffer. 

The other consideration is the complexity of the ad- 
dressing, which may rise to a point of impracticality. 


IX. Conclusion 

Visualizing a data stream as a vector of addressing bits 
allows the designer to treat data reorganization as a matrix 
transformation on that vector. This allows the designer 
to easily manipulate many transformations at once and 
find data orderings that are beneficial to other boards in 
the system. It also allows the designer to find address 
patterns that make single-buffer memory banks possible 
(as opposed to double-buffer banks), further reducing the 
necessary hardware. 

Once an address bit transformation has been chosen, 
the implementation can be easily designed using methods 
outlined in this article. 
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Fig. 1. SSSP. 


TIME - 2565 = 000000000101000000101 TIME * 2566 = 000000000101000000110 TIME = 2567 = 0000000001010000001 1 1 


HIGH CHANNEL 


INDEX - 5130 = 0000000001010000001010 


INDEX = 5132 = 0000000001010000001 100 


INDEX = 5134 = 0000000001010000001110 


LOW CHANNEL 


INDEX - 5131 = 000000000101000000101 1 


INDEX = 5133 = 0000000001010000001 101 


INDEX = 5135 = 000000000101 0000001 1 1 1 


Fig. 2. Relating the time counter to the index number. 


INPUT - SEQUENTIAL ORDER (A 3 A 2 A ^ A 0 ) 


HIGH CHANNEL (HI/LO - 0) A 3 A z A 1 Aq = 

7=000 

0000 

7=001 

0010 

7=010 

0100 

7=011 

0110 

7= 100 
1000 

7=101 

1010 

7=110 

1100 

7=111 

1110 


0 

2 

4 

6 

8 

10 

12 

14 1 

LOW CHANNEL (Hl/LO - 1 ) A 3 A 2 A, A 0 * 

0001 

0011 

0101 

0111 

1001 

1011 

1101 

1111 


1 

3 

5 

7 

9 

11 

13 

15 l 

OUTPUT - HL ORDER (A 2 A^ A 0 A 3 ) 
7= 000 7=001 7=010 7=011 
HIGH CHANNEL (HI/LO = 0) A 2 A , A 0 A 3 « 0000 0010 0100 0110 

7=100 

1000 

7= 101 
1010 

7=110 

1100 

7=111 

1110 


0 

1 

2 

3 

4 

5 

6 

7 

LOW CHANNEL (HI/LO » 1)A 2 A 1 AqA 3 = 

0001 

0011 

0101 

0111 

1001 

1011 

1101 

1111 


8 

9 

10 

11 

12 

13 

14 

15 


Fig. 3. Relating input order to a binary counter. 
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(a) 


WRITE 

READ 

BUFFER A 
ADDRESSING 


BUFFER B 
ADDRESSING 


(b) 


WRITE 

READ 

READ/WRITE 

ADDRESSING 


|j SPEC 0 j 

SPEC 1 

I SPEC 2 | 

SPEC 3 

SPEC 4 | 

SPECS 








SPEC0 

I SPEC 1 | 

SPEC 2 

| SPEC 3 j 

SPEC 4 







| WRITE = SEQ | READ = 7 

| WRITE = SEQ | 

READ = 7 

| WRITE = SEQ | 

READ = 7 


ABODE 

CADEB 

ABCDE 

CADEB 

ABCDE 

CADEB 

1 0 0 0 0 


0 10 0 0 


~1 0 0 0 0 


0 10 0 0 


1 0 0 0 0 


0 1 0 0 o' 

0 10 0 0 


0 0 0 0 1 


0 10 0 0 


0 0 0 0 1 


0 10 0 0 


0 0 0 0 1 

0 0 10 0 


1 0 0 0 0 


0 0 10 0 


1 0 0 0 0 


0 0 10 0 


1 0 0 0 0 

0 0 0 1 0 


0 0 10 0 


0 0 0 1 0 


0 0 10 0 


0 0 0 1 0 


0 0 10 0 

0 0 0 0 1 


0 0 0 1 0 


0 0 0 0 1 


0 0 0 1 0 


0 0 0 0 1 


0 0 0 1 0 


| WRITE = SEQ ] READ = T 1 WRITE = SEQ | READ = 7 | WRITE = SEQ | 


ABODE 

"10000" 
0 10 0 0 
0 0 10 0 
0 0 0 1 0 
0 0 0 0 1 


CADEB 

'o 1 0 0 o' 
0 0 0 0 1 
1 0 0 0 0 
0 0 10 0 
0 0 0 1 0 


ABODE 
'l 0 0 0 0 
0 10 0 0 
0 0 10 0 
0 0 0 1 0 
0 0 0 0 1 


CADEB 

'o 1 0 0 o' 
0 0 0 0 1 
1 0 0 0 0 
0 0 10 0 
0 0 0 1 0 


ABODE 

~ioooo" 
0 10 0 0 
10 10 0 
0 0 0 1 0 
0 0 0 0 1 


| SPEC0 I 

SPEC 1 j 

j SPEC 2 I 

| SPEC 3 | 

SPEC 4 | 

SPECS 


1 SPEC 0 j 

| SPEC 1 | 

| SPEC 2 | 

SPEC 3 | 

SPEC 4 


1 SEQ = 7° | 

7 ! 

1 

1 il 1 

r 4 ! 

7 5 = 7° 


ABODE CADEB 


DCEBA EDBAC 


BEACD ABODE 


1 0 0 0 0 


0 10 0 0 

0 10 0 0 


0 0 0 0 1 

0 0 10 0 


1 0 0 0 0 

0 0 0 1 0 


0 0 10 0 

0 0 0 0 1 


0 0 0 1 0 


0 0 0 0 1 


0 0 0 1 0 

0 0 0 1 0 


0 0 10 0 

0 10 0 0 


0 0 0 0 1 

1 0 0 0 0 


0 10 0 0 

0 0 10 0 


1 0 0 0 0 


0 0 10 0 


1 0 0 0 0 

1 0 0 0 0 


0 10 0 0 

0 0 0 1 0 


0 0 10 0 

0 0 0 0 1 


0 0 0 1 0 

0 10 0 0 


0 0 0 0 1 


Fig. 4. How to address memories: (a) using a double buffer and (b) using a single buffer. 


HIGH DATA IN | 

PH 

0 

0 

ra 

8 

10 

12 

0 

16 

18 

20 

22 

24 

26 

28 

30 

L 
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i 

1 1 
1 1 

1 1 1 

1 1 

i 

LOW DATA IN 

0 

ra 

0 

0 

0 

0 

13 

15 

0 

19 

21 

23 

25 

27 

29 

31 

r 



l 

1 

1 

1 

1 

1 

1 

1 

1 

i 

i 

i 

WRITE CYCLE 

0-7 

8-15 

16-23 

r 




1 00 nsec 1 00 nsec 1 00 nsec 

Fig. 5. Timing of eight-way interleave. 
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V 

— 

□ 
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Fig. 6. Memory buffer divided into eight slices. 
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HZmcOmw r~> — izmcOmco r-> — izmcOmO) r-> — izmCOmw 


INTO IN8UF 
4 M {SEQUENTIAL) 

SEQUENTIAL ► 

[h/T-':: — AAARAL£KAAAEMAA£.<±JIAJLJJA 


STEP 1 - MATRICIZE 
4 K (SEQUENTIAL) x 1 K (SEQUENTIAL) 
SEQUENTIAL 



AB^c_d_e_f^g_h_±j_k_l_ - M.AAARAAZAA 
ROW COLUMN 


STEP 2 - CORNER-TURN 
1 K (SEQUENTIAL) x 4 K (SEQUENTIAL) 


M.E°.JL2.AJLZM.X - AAZZM.LEEAAAL 

NEW ROW - OLD COLUMN NEW COLUMN - OLD ROW 


SEQUENTIAL 
H/L 


STEP 3 - PUT SEPARATE ROWS ON EACH CHANNEL 
1 K (SEQUENTIAL) x 4 K (SEQUENTIAL ON EACH CHANNEL) 


•HL-RADIX-4- 



M N O P Q R S TU - ABODE F G H 1 J K L 

L »» 


ROW COLUMN 


A 

L 


STEP 4 - PUT COLUMNS IN HL-RADIX-4 ORDER 
1 K (SEQUENTIAL) x 4 K (HL-RADIX-4 ON EACH CHANNEL) 


HL-RADIX-4 


H ► 

M N O P 0 R S T U - 

CDEEGHIJKLAB 


ROW 

COLUMN 


Fig. 7 . INBUF data reordering, 




C 20 
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<>16 


<>14 
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c 7 

Os 
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C 

D 
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F 
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A 

U 

C 

V 
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N 

H 

P 

Q 

R 

S 

A 

U 

C 

D 

E 

F 

G 

r 

\ 

J 

K 

L 

M 

B 

O 

V 
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G 

T 

1 

J 

K 

L 

M 

B 

0 

P 

0 

R 
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A 

U 

C 

D 

E 

F 

N 

H 
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G 

A 

U 

C 

D 

E 
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N 

H 

1 

J 

K 

L 

M 

B 

0 

P 

0 

R 

G 

T 

V 
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L 

M 

B 

0 

P 

Q 

R 
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T 

U 

C 

D 

E 

F 
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H 

1 

J 

K 

S 

A 
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E 
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1 
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B 
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0 

R 

G 

T 

U 

C 

D 

L 
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N 

H 
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Fig. 8. INBUF transformation sequence. 


INTO INBUF 
4 M (SEQUENTIAL) 

SEQUENTIAL ► 

(h/l- AAAAAAARZAKZXRARARAZAX 


STEP 1 - MATRICIZE 
4 K (SEQUENTIAL) x 1 K (SEQUENTIAL) 
SEQUENTIAL 



AARRRRSiAZAKJL - XRAAARAZAX 

ROW COLUMN 


STEP 2 - CORNER-TURN 
1 K (TBD) x4 K (TBD) 

TBD 

T 
B 
D 


H/L 


i t \ 

NEW ROW - OLD COLUMN NEW COLUMN = OLD ROW 


STEP 3 - SEPARATE ROWS ON EACH CHANNEL 
1 K (TBD) x4 K (TBD ON EACH CHANNEL) 


TBD 



ROW 



COLUMN ROW 


T 

B 

D, 


STEP 4 - BITS A AND B ARE FIXED 
1 K (TBD) x4 K (TBD ON EACH CHANNEL) 
TBD ► 


H 

L 


ROW 


A A - X 

COLUMN ROW 


T 

B 

D 


STEP 5 - LEAVE J, K, L, AND V ADDRESS BITS IN THE SAME POSITIONS 
1 K (TBD) x4 K (TBD ON EACH CHANNEL) 

TBD ► 


H 

L 


________ _ - AKA 

i i i 

ROW 


A A ~ X 

COLUMN ROW 


T 

B 

D 


STEP 6 -CHOOSE T AND UTO MINIMIZE COMPLEXITY 
1 K (TBD) x 4 K (TBD ON EACH CHANNEL) 


jT J7 

I 1 

ROW 


TBD 


H 

L 


J K L 


A A - X 

COLUMN ROW 


STEP 7 - PICK A COLUMN ORDER THAT IS CONVENIENT FOR THE FFT 
1 K (TBD) x4 K (RANDY’S COLUMN ORDER ON EACH CHANNEL) 
RANDY’S COLUMN ORDER 



TA 

ROW 


AKAAKAARARAA - X 

COLUMN ROW 


N n 

D° 

YW, 

S 


STEP 8 - PICK A ROW ORDER TO MINIMIZE COMPLEXITY 

1 K (RANDY’S ROW ORDER) x 4 K (RANDY’S COLUMN ORDER ON EACH CHANNEL) 
RANDY’S COLUMN ORDER 


H 

L 


ZARA A AX HR ~ AKAARZARARAA 

i 1 i 


ROW 


COLUMN 


- X 

ROW 


Fig. 9. Revised INBUF data reordering. 




SPECTRUM = 0 


C 20 
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o u 

<M 3 
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C 10 
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C 8 

C 7 
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C 5 

C* 

C 3 

C 2 

Ci 

c 0 . 

HI/LO 

SPECTRUM = 0 A 
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c 

D 

E 

F 
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N 
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A 17 

A 16 
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A 10 

*9 

A 8 
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A 6 

A 5 

*4 

A 3 

A 2 

A 1 

*0 

u 

kji 

AM 


M 0 = M C 

SPECTRUM »1 TURSPQMNOJKLGH I EFCDAB V 

^1 ^0 *3 ^2 ^6 ^5 ^4 ^9 ^8 ^7 ^12 ^11 ^10 ^14 ^13 ^16 ^15 ^18 ^ 17 

M c M fi M a 

M z = M a 
M,=M b 

M 0 = M C 

Fig. 10. Revised INBUF transformation sequence. 



C 20 C 19 


c w 

C 16 

C 15 

C U 
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C 
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^18 A tf 
M e M d 

A 16 

A 15 

A u 

A *\3 

A \2 

A ^^ 

^10 

A 9 

a q 

A 1 

A 6 

A 5 

A 4 
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A 2 

A : 

A o 

M c 

M t> 

M a 

SPECTRUM = 1 

T U 
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J 
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E 

F 
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A : 

A 0 

^3 

A 2 

A 6 

A 5 

A 4 

A 9 

a b 

A 7 

A 12 

^11 

4 10 

^14 

A 13 

A 16 

A 15 

a :q 

^17 



M c M b M e M d M a 

M z = M a 

M,=M b " M d 
M 0 = M C - M g 


Fig. 11. Final INBUF transformation sequence. 



